Take-home Exercise 2

Author

QIU RUILIU

Published

December 6, 2023

Modified

December 15, 2023

Setting the Scene

The inquiry focuses on the key motivators prompting city residents to rise early for their daily commutes from home to work, and the consequences of discontinuing public bus services along specific routes. These issues represent significant challenges for transport operators and urban planners.

Traditionally, understanding these dynamics involved conducting extensive commuter surveys. These surveys, however, are expensive, time-intensive, and laborious. Moreover, the data collected often requires extensive processing and analysis, leading to reports that are frequently outdated by the time they are completed.

With the digitalization of urban infrastructure, including public buses, mass rapid transit systems, public utilities, and roads, new opportunities for data collection arise. The integration of pervasive computing technologies like GPS in vehicles and SMART cards among public transport users allows for detailed tracking of movement patterns across time and space.

Despite this, the rapid accumulation of geospatial data has overwhelmed planners’ capacity to effectively analyze and convert it into valuable insights. This inefficiency negatively impacts the return on investment in data collection and management.

Motivation and Objective

The purpose of this take-home project is twofold. First, it addresses the gap in applied research demonstrating the integration, analysis, and modeling of the increasingly available open data for effective policy-making. Despite the abundance of such data, there is a noticeable absence of practical studies showcasing its potential use in policy decisions.

Second, the project aims to fill the void in practical research illustrating the application of geospatial data science and analysis (GDSA) in decision-making processes.

Therefore, the assignment involves conducting a case study to showcase the value of GDSA. This will involve synthesizing publicly accessible data from various sources to construct spatial interaction models. These models will be used to identify and analyze factors influencing the urban mobility patterns of public bus transit.

Getting Started

pacman::p_load(tmap, sf, dplyr, DT,
               stplanr, performance, mapview,
               ggpubr, tidyverse)

Data Importing

Geospatial Data Importing

busstop <- st_read(dsn = "data/geospatial",
                   layer = "BusStop") %>%
  st_transform(crs = 3414)
Reading layer `BusStop' from data source 
  `D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
mpsz <- st_read(dsn = "data/geospatial",
                   layer = "MPSZ-2019") %>%
  st_transform(crs = 3414)
Reading layer `MPSZ-2019' from data source 
  `D:\KathyChiu77\ISSS624\Take-home Ex\Take-home Ex 2\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 103.6057 ymin: 1.158699 xmax: 104.0885 ymax: 1.470775
Geodetic CRS:  WGS 84
mpsz
Simple feature collection with 332 features and 6 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM
First 10 features:
                 SUBZONE_N SUBZONE_C       PLN_AREA_N PLN_AREA_C       REGION_N
1              MARINA EAST    MESZ01      MARINA EAST         ME CENTRAL REGION
2         INSTITUTION HILL    RVSZ05     RIVER VALLEY         RV CENTRAL REGION
3           ROBERTSON QUAY    SRSZ01  SINGAPORE RIVER         SR CENTRAL REGION
4  JURONG ISLAND AND BUKOM    WISZ01  WESTERN ISLANDS         WI    WEST REGION
5             FORT CANNING    MUSZ02           MUSEUM         MU CENTRAL REGION
6         MARINA EAST (MP)    MPSZ05    MARINE PARADE         MP CENTRAL REGION
7                   SUDONG    WISZ03  WESTERN ISLANDS         WI    WEST REGION
8                  SEMAKAU    WISZ02  WESTERN ISLANDS         WI    WEST REGION
9           SOUTHERN GROUP    SISZ02 SOUTHERN ISLANDS         SI CENTRAL REGION
10                 SENTOSA    SISZ01 SOUTHERN ISLANDS         SI CENTRAL REGION
   REGION_C                       geometry
1        CR MULTIPOLYGON (((33222.98 29...
2        CR MULTIPOLYGON (((28481.45 30...
3        CR MULTIPOLYGON (((28087.34 30...
4        WR MULTIPOLYGON (((14557.7 304...
5        CR MULTIPOLYGON (((29542.53 31...
6        CR MULTIPOLYGON (((35279.55 30...
7        WR MULTIPOLYGON (((15772.59 21...
8        WR MULTIPOLYGON (((19843.41 21...
9        CR MULTIPOLYGON (((30870.53 22...
10       CR MULTIPOLYGON (((26879.04 26...
mpsz <- write_rds(mpsz, "data/rds/mpsz.rds")

Aspatial Data Importing

odbus <- read_csv("data/aspatial/origin_destination_bus_202310.csv")
glimpse(odbus)
Rows: 5,694,297
Columns: 7
$ YEAR_MONTH          <chr> "2023-10", "2023-10", "2023-10", "2023-10", "2023-…
$ DAY_TYPE            <chr> "WEEKENDS/HOLIDAY", "WEEKDAY", "WEEKENDS/HOLIDAY",…
$ TIME_PER_HOUR       <dbl> 16, 16, 14, 14, 17, 17, 17, 7, 14, 14, 10, 20, 20,…
$ PT_TYPE             <chr> "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "BUS", "…
$ ORIGIN_PT_CODE      <chr> "04168", "04168", "80119", "80119", "44069", "2028…
$ DESTINATION_PT_CODE <chr> "10051", "10051", "90079", "90079", "17229", "2014…
$ TOTAL_TRIPS         <dbl> 3, 5, 3, 5, 4, 1, 24, 2, 1, 7, 3, 2, 5, 1, 1, 1, 1…
odbus$ORIGIN_PT_CODE <- as.factor(odbus$ORIGIN_PT_CODE)
odbus$DESTINATION_PT_CODE <- as.factor(odbus$DESTINATION_PT_CODE) 

Extracting the Study Data

weekday6_9 <- odbus %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 6 &
           TIME_PER_HOUR <= 9) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))
datatable(weekday6_9)

We will save the output in rds format for future used.

write_rds(weekday6_9, "data/rds/weekday6_9.rds")

The code chunk below will be used to import the save weekday6_9.rds into R environment.

weekday6_9 <- read_rds("data/rds/weekday6_9.rds")
weekday17_20 <- odbus %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 17 &
           TIME_PER_HOUR <= 20) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))
datatable(weekday17_20)
write_rds(weekday17_20, "data/rds/weekday17_20.rds")
weekday17_20 <- read_rds("data/rds/weekday17_20.rds")
weekend11_14 <- odbus %>%
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 11 &
           TIME_PER_HOUR <= 14) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))
datatable(weekend11_14)
write_rds(weekend11_14, "data/rds/weekend11_14.rds")
weekend11_14 <- read_rds("data/rds/weekend11_14.rds")
weekend16_19 <- odbus %>%
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 16 &
           TIME_PER_HOUR <= 19) %>%
  group_by(ORIGIN_PT_CODE,
           DESTINATION_PT_CODE) %>%
  summarise(TRIPS = sum(TOTAL_TRIPS))
datatable(weekend16_19)
write_rds(weekend16_19, "data/rds/weekend16_19.rds")
weekend16_19 <- read_rds("data/rds/weekend16_19.rds")

Geospatial Data Wrangling

Combining Busstop and mpsz

busstop_mpsz <- st_intersection(busstop, mpsz) %>%
  select(BUS_STOP_N, SUBZONE_C)
mapview(busstop_mpsz)

Creating Hexagon Data

Create hexagonal grid based on the extent of mpsz dataset

hex_grid <- st_make_grid(mpsz, cellsize = 2 * 375 / sqrt(3), square = FALSE)

Convert the grid to an sf object and add an ID for each hexagon

hex_grid_sf <- st_sf(geometry = hex_grid) %>%
  mutate(hex_id = 1:length(hex_grid))

Count the number of bus stops in each hexagon

busstop_counts <- st_intersects(hex_grid_sf, busstop_mpsz, sparse = FALSE) %>% 
  apply(1, function(x) sum(x, na.rm = TRUE))

Add bus stop count to hex grid sf object

hex_grid_sf$bus_stop_count <- busstop_counts

Filter out hexagons with no bus stops

hex_grid_sf <- hex_grid_sf %>%
  filter(bus_stop_count > 0)
tmap_options(check.and.fix = TRUE)
map_hexagon <- tm_shape(hex_grid_sf) +
  tm_polygons(
    col = "bus_stop_count",
    palette = "Purples",
    style = "cont",
    title = "Number of Bus Stops",
    id = "hex_id",
    showNA = FALSE,
    alpha = 0.6,
    popup.vars = c("Number of Bus Stops" = "bus_stop_count"),
    popup.format = list(bus_stop_count = list(format = "f", digits = 0))
  ) +
  tm_shape(mpsz) + 
  tm_borders(col = "grey75", lwd = 0.7) +
  tm_layout(
    main.title = "Bus Stop Distribution in Singapore",
    main.title.size = 1.5,
    legend.title.size = 1,
    legend.text.size = 0.8,
    legend.position = c("left", "bottom"),
    frame = FALSE,
    inner.margins = c(0.05, 0.05, 0.05, 0.05)
  ) +
  tm_credits("Data Source: LTA DataMall, Data.gov.sg", position = c("RIGHT", "BOTTOM"), size = 0.8) +
  tm_view(view.legend.position = c("left", "bottom"))

map_hexagon

Next, we are going to append the planning subzone code from busstop_mpsz data frame onto odbus6_9 data frame.

busstop_hex <- st_intersection(busstop, hex_grid_sf) %>%
  select(BUS_STOP_N, hex_id) %>%
  st_drop_geometry()
busstop_hex <- na.omit(busstop_hex)
head(busstop_hex)
     BUS_STOP_N hex_id
3269      25059    393
254       26379    444
2570      25751    488
4203      26389    490
2403      26369    491
2897      25761    535
write_rds(busstop_hex, "data/rds/busstop_hex.rds")  

Preparing Flow Data

Creating Flow Data

weekday_morning_od <- left_join(weekday6_9 , busstop_hex,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_HEX = hex_id,
         DESTIN_BS = DESTINATION_PT_CODE)

Before continue, it is a good practice for us to check for duplicating records.

duplicate <- weekday_morning_od %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()

If duplicated records are found, the code chunk below will be used to retain the unique records.

weekday_morning_od <- unique(weekday_morning_od)
weekday_morning_od <- left_join(weekday_morning_od , busstop_hex,
            by = c("DESTIN_BS" = "BUS_STOP_N"))
duplicate <- weekday_morning_od %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
weekday_morning_od <- unique(weekday_morning_od)
weekday_morning_od <- weekday_morning_od %>%
  rename(DESTIN_HEX = hex_id) %>%
  drop_na() %>%
  group_by(ORIGIN_HEX, DESTIN_HEX) %>%
  summarise(WEEKDAY_MORNING_PEAK = sum(TRIPS))

It is time to save the output into an rds file format.

write_rds(weekday_morning_od, "data/rds/weekday_morning_od.rds")
weekday_morning_od <- read_rds("data/rds/weekday_morning_od.rds")
weekday_afternoon_od <- left_join(weekday17_20 , busstop_hex,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_HEX = hex_id,
         DESTIN_BS = DESTINATION_PT_CODE)
duplicate <- weekday_afternoon_od %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
weekday_afternoon_od <- unique(weekday_afternoon_od)
weekday_afternoon_od <- left_join(weekday_afternoon_od , busstop_hex,
            by = c("DESTIN_BS" = "BUS_STOP_N"))
duplicate <- weekday_afternoon_od %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
weekday_afternoon_od <- unique(weekday_afternoon_od)
weekday_afternoon_od <- weekday_afternoon_od %>%
  rename(DESTIN_HEX = hex_id) %>%
  drop_na() %>%
  group_by(ORIGIN_HEX, DESTIN_HEX) %>%
  summarise(WEEKDAY_AFTERNOON_PEAK = sum(TRIPS))
write_rds(weekday_afternoon_od, "data/rds/weekday_afternoon_od.rds")
weekday_afternoon_od <- read_rds("data/rds/weekday_afternoon_od.rds")

Weekend Morning Peak

weekend_morning_od <- left_join(weekend11_14 , busstop_hex,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_HEX = hex_id,
         DESTIN_BS = DESTINATION_PT_CODE)
duplicate <- weekend_morning_od %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
weekend_morning_od <- unique(weekend_morning_od)
weekend_morning_od <- left_join(weekend_morning_od , busstop_hex,
            by = c("DESTIN_BS" = "BUS_STOP_N"))
duplicate <- weekend_morning_od %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
weekend_morning_od <- unique(weekend_morning_od)
weekend_morning_od <- weekend_morning_od %>%
  rename(DESTIN_HEX = hex_id) %>%
  drop_na() %>%
  group_by(ORIGIN_HEX, DESTIN_HEX) %>%
  summarise(WEEKEND_MORNING_PEAK = sum(TRIPS))
write_rds(weekend_morning_od, "data/rds/weekend_morning_od.rds")
weekend_morning_od <- read_rds("data/rds/weekend_morning_od.rds")

Weekend Evening Peak

weekend_evening_od <- left_join(weekend16_19 , busstop_hex,
            by = c("ORIGIN_PT_CODE" = "BUS_STOP_N")) %>%
  rename(ORIGIN_BS = ORIGIN_PT_CODE,
         ORIGIN_HEX = hex_id,
         DESTIN_BS = DESTINATION_PT_CODE)
duplicate <- weekend_evening_od %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
weekend_evening_od <- unique(weekend_evening_od)
weekend_evening_od <- left_join(weekend_evening_od , busstop_hex,
            by = c("DESTIN_BS" = "BUS_STOP_N"))
duplicate <- weekend_evening_od %>%
  group_by_all() %>%
  filter(n()>1) %>%
  ungroup()
weekend_evening_od <- unique(weekend_evening_od)
weekend_evening_od <- weekend_evening_od %>%
  rename(DESTIN_HEX = hex_id) %>%
  drop_na() %>%
  group_by(ORIGIN_HEX, DESTIN_HEX) %>%
  summarise(WEEKEND_EVENING_PEAK = sum(TRIPS))
write_rds(weekend_evening_od, "data/rds/weekend_evening_od.rds")
weekend_evening_od <- read_rds("data/rds/weekend_evening_od.rds")

Visualising Spatial Interaction

Removing Intra-zonal Flows

weekday_morning_od1 <- weekday_morning_od[weekday_morning_od$ORIGIN_HEX!=weekday_morning_od$DESTIN_HEX,]
weekday_afternoon_od1 <- weekday_afternoon_od[weekday_afternoon_od$ORIGIN_HEX!=weekday_afternoon_od$DESTIN_HEX,]
weekend_morning_od1 <- weekend_morning_od[weekend_morning_od$ORIGIN_HEX!=weekend_morning_od$DESTIN_HEX,]
weekend_evening_od1 <- weekend_evening_od[weekend_evening_od$ORIGIN_HEX!=weekend_evening_od$DESTIN_HEX,]

Creating Desire Lines

weekday_morning_flowLine <- od2line(flow = weekday_morning_od1, 
                    zones = hex_grid_sf,
                    zone_code = "hex_id")
weekday_afternoon_flowLine <- od2line(flow = weekday_afternoon_od1, 
                    zones = hex_grid_sf,
                    zone_code = "hex_id")
weekend_morning_flowLine <- od2line(flow = weekend_morning_od1, 
                    zones = hex_grid_sf,
                    zone_code = "hex_id")
weekend_evening_flowLine <- od2line(flow = weekend_evening_od1, 
                    zones = hex_grid_sf,
                    zone_code = "hex_id")

Visualising the Desire Lines

tm_shape(hex_grid_sf) +
  tm_polygons(
    border.col = "grey50", 
    border.alpha = 0.6, 
    alpha = 0.1
  ) +
  tm_shape(weekday_morning_flowLine %>% 
             filter(WEEKDAY_MORNING_PEAK >= 5000)) +
  tm_lines(
    lwd = "WEEKDAY_MORNING_PEAK",
    style = "quantile",
    scale = c(0.1, 1, 3, 5, 7, 10),
    n = 6,
    alpha = 0.5,
    palette = "Blues"
  ) +
  tm_shape(mpsz) +
  tm_borders(
    col = "darkblue", 
    alpha = 0.1,
    lwd = 1.5
  ) +
  tm_layout(
    main.title = "Weekday Morning Commute Flows in Singapore",
    main.title.position = "center",
    main.title.size = 1.0,
    legend.title.size = 0.8,
    legend.text.size = 0.7,
    legend.position = c("left", "bottom"),
    frame = FALSE,
    inner.margins = c(0.05, 0.05, 0.05, 0.05)
  ) +
  tm_credits("Source: LTA DataMall", position = c("RIGHT", "BOTTOM"), size = 0.5)

tm_shape(hex_grid_sf) +
  tm_polygons(
    border.col = "grey50", 
    border.alpha = 0.6, 
    alpha = 0.1
  ) +
  tm_shape(weekday_afternoon_flowLine %>% 
             filter(WEEKDAY_AFTERNOON_PEAK >= 5000)) +
  tm_lines(
    lwd = "WEEKDAY_AFTERNOON_PEAK",
    style = "quantile",
    scale = c(0.1, 1, 3, 5, 7, 10),
    n = 6,
    alpha = 0.5,
    palette = "Blues"
  ) +
  tm_shape(mpsz) +
  tm_borders(
    col = "darkblue", 
    alpha = 0.1,
    lwd = 1.5
  ) +
  tm_layout(
    main.title = "Weekday Afternoon Commute Flows in Singapore",
    main.title.position = "center",
    main.title.size = 1.0,
    legend.title.size = 0.8,
    legend.text.size = 0.7,
    legend.position = c("left", "bottom"),
    frame = FALSE,
    inner.margins = c(0.05, 0.05, 0.05, 0.05)
  ) +
  tm_credits("Source: LTA DataMall", position = c("RIGHT", "BOTTOM"), size = 0.5)

tm_shape(hex_grid_sf) +
  tm_polygons(
    border.col = "grey50", 
    border.alpha = 0.6, 
    alpha = 0.1
  ) +
  tm_shape(weekend_morning_flowLine %>% 
             filter(WEEKEND_MORNING_PEAK >= 5000)) +
  tm_lines(
    lwd = "WEEKEND_MORNING_PEAK",
    style = "quantile",
    scale = c(0.1, 1, 3, 5, 7, 10),
    n = 6,
    alpha = 0.5,
    palette = "Blues"
  ) +
  tm_shape(mpsz) +
  tm_borders(
    col = "darkblue", 
    alpha = 0.1,
    lwd = 1.5
  ) +
  tm_layout(
    main.title = "Weekend Morning Commute Flows in Singapore",
    main.title.position = "center",
    main.title.size = 1.0,
    legend.title.size = 0.8,
    legend.text.size = 0.7,
    legend.position = c("left", "bottom"),
    frame = FALSE,
    inner.margins = c(0.05, 0.05, 0.05, 0.05)
  ) +
  tm_credits("Source: LTA DataMall", position = c("RIGHT", "BOTTOM"), size = 0.5)

tm_shape(hex_grid_sf) +
  tm_polygons(
    border.col = "grey50", 
    border.alpha = 0.6, 
    alpha = 0.1
  ) +
  tm_shape(weekend_evening_flowLine %>% 
             filter(WEEKEND_EVENING_PEAK >= 5000)) +
  tm_lines(
    lwd = "WEEKEND_EVENING_PEAK",
    style = "quantile",
    scale = c(0.1, 1, 3, 5, 7, 9, 11, 13, 15),
    n = 9,
    alpha = 0.5,
    palette = "Blues"
  ) +
  tm_shape(mpsz) +
  tm_borders(
    col = "darkblue", 
    alpha = 0.1,
    lwd = 1.5
  ) +
  tm_layout(
    main.title = "Weekend Evening Commute Flows in Singapore",
    main.title.position = "center",
    main.title.size = 1.0,
    legend.title.size = 0.8,
    legend.text.size = 0.7,
    legend.position = c("left", "bottom"),
    frame = FALSE,
    inner.margins = c(0.05, 0.05, 0.05, 0.05)
  ) +
  tm_credits("Source: LTA DataMall", position = c("RIGHT", "BOTTOM"), size = 0.5)